-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically extract the download URL from the feed #47
Conversation
Extract the download link.
Add depenency
ctyparser/bigcty.py
Outdated
dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip' # TODO: Issue #10 | ||
page = session.get(update_url) | ||
tree = html.fromstring(page.content) | ||
dl_url = tree.xpath("//a[contains(@href,'zip')]/@href")[0] | ||
rq = session.get(dl_url) | ||
if rq.status_code == 404: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this codepath still necessary? I think it exists because there was a change in the feed bafa05d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that fallback is needed, but very differently, and only if the download URL extraction fails (see other comment)
ctyparser/bigcty.py
Outdated
dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip' # TODO: Issue #10 | ||
page = session.get(update_url) | ||
tree = html.fromstring(page.content) | ||
dl_url = tree.xpath("//a[contains(@href,'zip')]/@href")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if tree.xpath()
doesn't find anything? would [0]
IndexError
? we should handle that, perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
@@ -172,7 +173,9 @@ def update(self) -> bool: | |||
|
|||
with tempfile.TemporaryDirectory() as temp: | |||
path = pathlib.PurePath(temp) | |||
dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip' # TODO: Issue #10 | |||
page = session.get(update_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably check for status before trying to parse the content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHANGELOG.md needs to be updated
Your commits will also need to be squashed together, and the resulting commit should describe what it does
@@ -172,7 +173,9 @@ def update(self) -> bool: | |||
|
|||
with tempfile.TemporaryDirectory() as temp: | |||
path = pathlib.PurePath(temp) | |||
dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip' # TODO: Issue #10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to keep that URL as a fallback, like the other URL format already is
ctyparser/bigcty.py
Outdated
dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip' # TODO: Issue #10 | ||
page = session.get(update_url) | ||
tree = html.fromstring(page.content) | ||
dl_url = tree.xpath("//a[contains(@href,'zip')]/@href")[0] | ||
rq = session.get(dl_url) | ||
if rq.status_code == 404: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that fallback is needed, but very differently, and only if the download URL extraction fails (see other comment)
@@ -7,3 +7,4 @@ sphinx | |||
# Dependencies | |||
feedparser | |||
requests | |||
lxml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is lxml not needed as a normal dependency? This file is only used for development, so lxml
should also be added to setup.py (setup.py being what defines dependencies to install when someone install ctyparser)
Sorry, I think I'm creating more of a problem for you than what I'm solving. |
Description
Use lxml xpath to pull download link off of web page to clear issue #10
Fixes #10
Type of change
How has this been tested?
By running it.
Checklist
DEVELOPING.md
, if it exists)CHANGELOG.md
updated if needed